-
Notifications
You must be signed in to change notification settings - Fork 70
feat: add find field (by id and name) support to schema #180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add find field (by id and name) support to schema #180
Conversation
nullccxsy
commented
Aug 15, 2025
- add insensitive way to find schemafield(list, struct, map)
- change the complexity of find name to O(1)
- test insensitive way to find schemafield(list, struct, map)
adc4817 to
ec73749
Compare
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3b0653a to
ae4515e
Compare
0394c01 to
ef4b2de
Compare
src/iceberg/schema.cc
Outdated
| NametoIdVisitor::GetPath(path, std::string(field.name()), case_sensitive_); | ||
| short_full_path = GetPath(short_path, std::string(field.name()), case_sensitive_); | ||
| name_to_id_[full_path] = field.field_id(); | ||
| name_to_id_.emplace(short_full_path, field.field_id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the return value and error out if failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it fails, it means that the short name conflicts with the canonical name. In this case, the short name is discarded, so don't need to check the return value.
8319453 to
8ba65bf
Compare
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! I think we are close to the finish line.
|
Please rename to title to |
|
Please note that the java impl has |
src/iceberg/schema.h
Outdated
| Result<Status> InitIdToIndexMap() const; | ||
| Result<Status> InitNameToIndexMap() const; | ||
| Result<Status> InitLowerCaseNameToIndexMap() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they "Ensure" rather than Init?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May name it LazyIdToIndexMap ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose here is not "Init" ( Maybe reinitialize ) Mapping, it do Ensure the mapping to be "initialized", I guess EnusreIdToIndexMap() is ok to me.
Generally, LazyIdToIndexMap seems to be another api like Result<const map<...>&> LazyIdToIndexMap
mapleFU
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I feel a bit afraid about lazy initialize mapping, since it would be thread-unsafetly. But maybe it's not a severe problem here.
For lazy initialization, we may leverage |
So this requires a mutex and three once_flag? It might be huge but it's ok to me... |
ok, this will be fixed |
We can support this is another patch, currently we can add a todo here. |
0f85748 to
f0dbff4
Compare
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @nullccxsy!
d5f8d92 to
4df8eb7
Compare
- Add GetFieldByName method to support nested field queries with dot notation - Implement InitNameToIndexMap and InitIdToIndexMap for field mapping - Introduce SchemaFieldVisitor for schema traversal - Implement name pruning logic for element and value in nested structure
- Introduce IdVisitor for schema traversal to init id_to_index_(Map) - Introduce NameVisitor for schema traversal to init name_to_index_(Map), lowercase_name_to_index_(map)
…ava impl 1. Refactor IdVisitor: Add VisitNested() to handle NestedType, make Visit() generic for all Type hierarchy (avoid anti-pattern) 2. Merge id_to_index_ + full_schemafield_ into id_to_field_ (std::unordered_map<int32_t, const ref<SchemaField>>) to reduce memory footprint 3. Align with Java impl's idToField design (support future alias/identifier fields without refactor)
- change the type of id from size_t to int_32 - rename GetPath to BuildPath - fix bug new_short_path should build from short_path, not path
…sult call in macros - Added logic to detect and handle duplicate names in NameToIdVisitor to prevent invalid schema errors. - Fixed duplicate result call issue in src/iceberg/util/macros.h by optimizing ICEBERG_RETURN_UNEXPECTED macro.
… conversion function - Implemented detection for duplicate field IDs, Names in visitors, returning kInvalidSchema error with message. - Updated lowercase conversion in NameToIdVisitor
…reduce copies and unify initialization - Switched field_ and schema_ members in test classes to std::unique_ptr for unique ownership and to avoid duplicate and avoid creating duplicate objects.
… std::string creation
17bcac2 to
9aa221d
Compare
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @nullccxsy!
Xuanwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this!